120 research outputs found

    On Maximum Margin Hierarchical Classification

    No full text
    We present work in progress towards maximum margin hierarchical classification where the objects are allowed to belong to more than one category at a time. The classification hierarchy is represented as a Markov network equipped with an exponential family defined on the edges. We present a variation of the maximum margin multilabel learning framework, suited to the hierarchical classification task and allows efficient implementation via gradient-based methods. We compare the behaviour of the proposed method to the recently introduced hierarchical regularized least squares classifier as well as two SVM variants in Reuter's news article classification

    Biomarker Discovery by Sparse Canonical Correlation Analysis of Complex Clinical Phenotypes of Tuberculosis and Malaria

    Get PDF
    Biomarker discovery aims to find small subsets of relevant variables in ‘omics data that correlate with the clinical syndromes of interest. Despite the fact that clinical phenotypes are usually characterized by a complex set of clinical parameters, current computational approaches assume univariate targets, e.g. diagnostic classes, against which associations are sought for. We propose an approach based on asymmetrical sparse canonical correlation analysis (SCCA) that finds multivariate correlations between the ‘omics measurements and the complex clinical phenotypes. We correlated plasma proteomics data to multivariate overlapping complex clinical phenotypes from tuberculosis and malaria datasets. We discovered relevant ‘omic biomarkers that have a high correlation to profiles of clinical measurements and are remarkably sparse, containing 1.5–3% of all ‘omic variables. We show that using clinical view projections we obtain remarkable improvements in diagnostic class prediction, up to 11% in tuberculosis and up to 5% in malaria. Our approach finds proteomic-biomarkers that correlate with complex combinations of clinical-biomarkers. Using the clinical-biomarkers improves the accuracy of diagnostic class prediction while not requiring the measurement plasma proteomic profiles of each subject. Our approach makes it feasible to use omics' data to build accurate diagnostic algorithms that can be deployed to community health centres lacking the expensive ‘omics measurement capabilities

    Improving the Nutrient Content of Food through Genetic Modification: Evidence from Experimental Auctions on Consumer Acceptance

    Get PDF
    This paper assesses consumers’ acceptance of nutritionally enhanced vegetables using a series of auction experiments administered to a random sample of adult consumers. Evidence suggests that consumers are willing to pay significantly more for fresh produce with labels signaling enhanced levels of antioxidants and vitamin C achieved by moving genes from within the species, as opposed to across species. However, this premium is significantly affected by diverse information treatments injected into the experiments

    CamOptimus: a tool for exploiting complex adaptive evolution to optimize experiments and processes in biotechnology

    Get PDF
    Multiple interacting factors affect the performance of engineered biological systems in synthetic biology projects. The complexity of these biological systems means that experimental design should often be treated as a multiparametric optimization problem. However, the available methodologies are either impractical, due to a combinatorial explosion in the number of experiments to be performed, or are inaccessible to most experimentalists due to the lack of publicly available, user-friendly software. Although evolutionary algorithms may be employed as alternative approaches to optimize experimental design, the lack of simple-to-use software again restricts their use to specialist practitioners. In addition, the lack of subsidiary approaches to further investigate critical factors and their interactions prevents the full analysis and exploitation of the biotechnological system. We have addressed these problems and, here, provide a simple-to-use and freely available graphical user interface to empower a broad range of experimental biologists to employ complex evolutionary algorithms to optimize their experimental designs. Our approach exploits a Genetic Algorithm to discover the subspace containing the optimal combination of parameters, and Symbolic Regression to construct a model to evaluate the sensitivity of the experiment to each parameter under investigation. We demonstrate the utility of this method using an example in which the culture conditions for the microbial production of a bioactive human protein are optimized. CamOptimus is available through: (https://doi.org/10.17863/CAM.10257).EU 7th Framework Programme (BIOLEDGE Contract No: 289126 to S. G. O and J. R), BBSRC (BRIC2.2 to S. G. O. and N. K. H. S.), Synthetic Biology Research Initiative Cambridge (SynBioFund to D. D., A. C. C. and J. M. L. D.

    metaCCA : summary statistics-based multivariate meta-analysis of genome-wide association studies using canonical correlation analysis

    Get PDF
    Motivation: A dominant approach to genetic association studies is to perform univariate tests between genotype-phenotype pairs. However, analyzing related traits together increases statistical power, and certain complex associations become detectable only when several variants are tested jointly. Currently, modest sample sizes of individual cohorts, and restricted availability of individual-level genotype-phenotype data across the cohorts limit conducting multivariate tests. Results: We introduce metaCCA, a computational framework for summary statistics-based analysis of a single or multiple studies that allows multivariate representation of both genotype and phenotype. It extends the statistical technique of canonical correlation analysis to the setting where original individual-level records are not available, and employs a covariance shrinkage algorithm to achieve robustness. Multivariate meta-analysis of two Finnish studies of nuclear magnetic resonance metabolomics by metaCCA, using standard univariate output from the program SNPTEST, shows an excellent agreement with the pooled individual-level analysis of original data. Motivated by strong multivariate signals in the lipid genes tested, we envision that multivariate association testing using metaCCA has a great potential to provide novel insights from already published summary statistics from high-throughput phenotyping technologies.Peer reviewe

    The Impact of Free Trial Acceptance on Demand for Alternative Nicotine Products: Evidence from Experimental Auctions

    Get PDF
    Objectives: This study explored the relationship between product trials and consumer demand for alternative nicotine products (ANP). Methods: An experimental auction was conducted with 258 adult smokers, wherein participants were randomly assigned to one of four experimental conditions. The participants received the opportunity to try, but did not have to accept, one of three relatively novel ST products (i.e., snus, dissolvable tobacco, or medicinal nicotine), or they were placed into a control group (i.e., no trial). All the participants then bid on all three of these products, as well as on cigarettes. We assessed interest in using ANP based on both trial of the product and bids placed for the products in the experimental auction. Results: Fewer smokers were willing to try snus (44 %) than dissolvable tobacco (64 %) or medicine nicotine (68 %). For snus, we find modest evidence suggesting that willingness to try is associated with greater demand for the product. For dissolvable tobacco or medicinal nicotine, we find no evidence that those who accept the product trial have higher demand for the product. Conclusions: Free trials of a novel ANP were not strongly associated with product demand, as assessed by willingness to pay. Given the debate over the potential for ANP to reduce the harm from smoking, these results are important in understanding the impact of free trial offers on adoption of ST product as a strategy to reduce harm from tobacco use

    Learning with multiple pairwise kernels for drug bioactivity prediction

    Get PDF
    Motivation: Many inference problems in bioinformatics, including drug bioactivity prediction, can be formulated as pairwise learning problems, in which one is interested in making predictions for pairs of objects, e.g. drugs and their targets. Kernel-based approaches have emerged as powerful tools for solving problems of that kind, and especially multiple kernel learning (MKL) offers promising benefits as it enables integrating various types of complex biomedical information sources in the form of kernels, along with learning their importance for the prediction task. However, the immense size of pairwise kernel spaces remains a major bottleneck, making the existing MKL algorithms computationally infeasible even for small number of input pairs.Results: We introduce pairwiseMKL, the first method for time- and memory-efficient learning with multiple pairwise kernels. pairwiseMKL first determines the mixture weights of the input pairwise kernels, and then learns the pairwise prediction function. Both steps are performed efficiently without explicit computation of the massive pairwise matrices, therefore making the method applicable to solving large pairwise learning problems. We demonstrate the performance of pairwiseMKL in two related tasks of quantitative drug bioactivity prediction using up to 167 995 bioactivity measurements and 3120 pairwise kernels: (i) prediction of anticancer efficacy of drug compounds across a large panel of cancer cell lines; and (ii) prediction of target profiles of anticancer compounds across their kinome-wide target spaces. We show that pairwiseMKL provides accurate predictions using sparse solutions in terms of selected kernels, and therefore it automatically identifies also data sources relevant for the prediction problem
    • 

    corecore